Overview
Brought to you by YData
Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 14776615 |
| Missing cells | 7326379 |
| Missing cells (%) | 3.5% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 4.1 GiB |
| Average record size in memory | 295.0 B |
Variable types
| Text | 2 |
|---|---|
| Categorical | 2 |
| Numeric | 8 |
| DateTime | 2 |
dropoff_latitude is highly overall correlated with pickup_latitude and 1 other fields | High correlation |
passenger_count is highly overall correlated with store_and_fwd_flag | High correlation |
pickup_latitude is highly overall correlated with dropoff_latitude and 2 other fields | High correlation |
pickup_longitude is highly overall correlated with pickup_latitude | High correlation |
store_and_fwd_flag is highly overall correlated with dropoff_latitude and 3 other fields | High correlation |
trip_distance is highly overall correlated with trip_time_in_secs | High correlation |
trip_time_in_secs is highly overall correlated with trip_distance | High correlation |
vendor_id is highly overall correlated with store_and_fwd_flag | High correlation |
store_and_fwd_flag is highly imbalanced (84.7%) | Imbalance |
store_and_fwd_flag has 7326207 (49.6%) missing values | Missing |
rate_code is highly skewed (γ1 = 195.533726) | Skewed |
pickup_latitude is highly skewed (γ1 = -127.8099616) | Skewed |
dropoff_latitude is highly skewed (γ1 = -141.4446314) | Skewed |
pickup_longitude has 267494 (1.8%) zeros | Zeros |
pickup_latitude has 265104 (1.8%) zeros | Zeros |
dropoff_longitude has 275657 (1.9%) zeros | Zeros |
dropoff_latitude has 273357 (1.8%) zeros | Zeros |
Reproduction
| Analysis started | 2025-10-28 01:06:34.028097 |
|---|---|
| Analysis finished | 2025-10-28 01:17:05.672391 |
| Duration | 10 minutes and 31.64 seconds |
| Software version | ydata-profiling vv4.17.0 |
| Download configuration | config.json |
Variables
medallion
Text
| Distinct | 13426 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 GiB |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Unique
| Unique | 33 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 89D227B655E5C82AECF13C3F540D4CF4 |
|---|---|
| 2nd row | 0BD7C8F5BA12B88E0B67BED28BEA73D8 |
| 3rd row | 0BD7C8F5BA12B88E0B67BED28BEA73D8 |
| 4th row | DFD2202EE08F7A8DC9A57B02ACB81FE2 |
| 5th row | DFD2202EE08F7A8DC9A57B02ACB81FE2 |
| Value | Count | Frequency (%) |
| 7e1346f23960cc18d7d129fa28b63a75 | 2137 | < 0.1% |
| 6ffcf7a4f34ba44239636028e680e438 | 2112 | < 0.1% |
| a979cda04cfb8ba3d3acba7e8d7f0661 | 2039 | < 0.1% |
| d5c7cd37ea4d372d00f0a681cdc93f11 | 1959 | < 0.1% |
| 849e486825860106403fb991a763bcc3 | 1957 | < 0.1% |
| 6fe6dff9a59c0b64be0ca64ee2699f08 | 1941 | < 0.1% |
| 06c961ebe7ef4d13f3ae6c005ee0f501 | 1893 | < 0.1% |
| 22908753e00888cc219c875c8d5bc4f6 | 1886 | < 0.1% |
| e6101a0f85312c49a5b5950e61d284dc | 1882 | < 0.1% |
| 6403bf98e4618e21c795c3b45a636d77 | 1882 | < 0.1% |
| Other values (13416) | 14756927 |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| E | 29839347 | 6.3% |
| 4 | 29833374 | 6.3% |
| A | 29731461 | 6.3% |
| 9 | 29695897 | 6.3% |
| 5 | 29640335 | 6.3% |
| F | 29609660 | 6.3% |
| D | 29606522 | 6.3% |
| 7 | 29545983 | 6.2% |
| 6 | 29539893 | 6.2% |
| 2 | 29522375 | 6.2% |
| Other values (6) | 176286833 |
hack_license
Text
| Distinct | 32224 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 GiB |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Unique
| Unique | 182 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | BA96DE419E711691B9445D6A6307C170 |
|---|---|
| 2nd row | 9FD8F69F0804BDB5549F40E9DA1BE472 |
| 3rd row | 9FD8F69F0804BDB5549F40E9DA1BE472 |
| 4th row | 51EE87E3205C985EF8431D850C786310 |
| 5th row | 51EE87E3205C985EF8431D850C786310 |
| Value | Count | Frequency (%) |
| 00b7691d86d96aebd21dd9e138f90840 | 1933 | < 0.1% |
| f49fd0d84449ae7f72f3bc492cd6c754 | 1616 | < 0.1% |
| 51c1be97280a80ebfa8dad34e1956cf6 | 1603 | < 0.1% |
| 847349f8845a667d9ac7cdedd1c873cb | 1570 | < 0.1% |
| ce625fd96d0fafc812a6957139b354a1 | 1557 | < 0.1% |
| 3d757e111c78f5cac83d44a92885d490 | 1514 | < 0.1% |
| 22ca618759c716436ea3257480199a32 | 1501 | < 0.1% |
| 3aab94ca53fe93a64811f65690654649 | 1486 | < 0.1% |
| e66e58207128619cff2d2e2c3c7ecc08 | 1442 | < 0.1% |
| c9674190984ba193ffd8ddcc019804cf | 1390 | < 0.1% |
| Other values (32214) | 14761003 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 472851680 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 29776443 | 6.3% |
| E | 29713235 | 6.3% |
| 8 | 29655194 | 6.3% |
| 5 | 29608188 | 6.3% |
| 0 | 29603323 | 6.3% |
| D | 29589934 | 6.3% |
| 3 | 29585835 | 6.3% |
| 7 | 29576279 | 6.3% |
| F | 29553071 | 6.2% |
| B | 29539186 | 6.2% |
| Other values (6) | 176650992 |
vendor_id
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 732.8 MiB |
| CMT | |
|---|---|
| VTS |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CMT |
|---|---|
| 2nd row | CMT |
| 3rd row | CMT |
| 4th row | CMT |
| 5th row | CMT |
Common Values
| Value | Count | Frequency (%) |
| CMT | 7450899 | |
| VTS | 7325716 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| cmt | 7450899 | |
| vts | 7325716 |
Most occurring characters
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 44329845 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| T | 14776615 | |
| C | 7450899 | |
| M | 7450899 | |
| V | 7325716 | |
| S | 7325716 |
rate_code
Real number (ℝ)
Skewed
| Distinct | 14 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0342732 |
| Minimum | 0 |
|---|---|
| Maximum | 210 |
| Zeros | 667 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 210 |
| Range | 210 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.33877148 |
|---|---|
| Coefficient of variation (CV) | 0.32754545 |
| Kurtosis | 113260.8 |
| Mean | 1.0342732 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 195.53373 |
| Sum | 15283057 |
| Variance | 0.11476612 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 14456067 | |
| 2 | 239160 | 1.6% |
| 5 | 39889 | 0.3% |
| 4 | 22831 | 0.2% |
| 3 | 17655 | 0.1% |
| 0 | 667 | < 0.1% |
| 6 | 315 | < 0.1% |
| 210 | 11 | < 0.1% |
| 8 | 10 | < 0.1% |
| 128 | 4 | < 0.1% |
| Other values (4) | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 667 | < 0.1% |
| 1 | 14456067 | |
| 2 | 239160 | 1.6% |
| 3 | 17655 | 0.1% |
| 4 | 22831 | 0.2% |
| 5 | 39889 | 0.3% |
| 6 | 315 | < 0.1% |
| 7 | 2 | < 0.1% |
| 8 | 10 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 210 | 11 | < 0.1% |
| 128 | 4 | < 0.1% |
| 65 | 1 | < 0.1% |
| 28 | 2 | < 0.1% |
| 9 | 1 | < 0.1% |
| 8 | 10 | < 0.1% |
| 7 | 2 | < 0.1% |
| 6 | 315 | < 0.1% |
| 5 | 39889 | |
| 4 | 22831 |
store_and_fwd_flag
Categorical
High correlation Imbalance Missing
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 7326207 |
| Missing (%) | 49.6% |
| Memory size | 14.1 MiB |
| N | |
|---|---|
| Y | 165177 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | N |
|---|---|
| 2nd row | N |
| 3rd row | N |
| 4th row | N |
| 5th row | N |
Common Values
| Value | Count | Frequency (%) |
| N | 7285231 | |
| Y | 165177 | 1.1% |
| (Missing) | 7326207 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| n | 7285231 | |
| y | 165177 | 2.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| N | 7285231 | |
| Y | 165177 | 2.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 7450408 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| N | 7285231 | |
| Y | 165177 | 2.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 7450408 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| N | 7285231 | |
| Y | 165177 | 2.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 7450408 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| N | 7285231 | |
| Y | 165177 | 2.2% |
pickup_datetime
Date
| Distinct | 2303465 |
|---|---|
| Distinct (%) | 15.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 112.7 MiB |
| Minimum | 2013-01-01 00:00:00 |
|---|---|
| Maximum | 2013-01-31 23:59:59 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
dropoff_datetime
Date
| Distinct | 2305816 |
|---|---|
| Distinct (%) | 15.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 112.7 MiB |
| Minimum | 2013-01-01 00:00:36 |
|---|---|
| Maximum | 2013-02-01 10:33:08 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
High correlation
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.6973721 |
| Minimum | 0 |
|---|---|
| Maximum | 255 |
| Zeros | 166 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 255 |
| Range | 255 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.3653958 |
|---|---|
| Coefficient of variation (CV) | 0.80441749 |
| Kurtosis | 118.26646 |
| Mean | 1.6973721 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.6812626 |
| Sum | 25081414 |
| Variance | 1.8643057 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 10471701 | |
| 2 | 1986196 | 13.4% |
| 5 | 920006 | 6.2% |
| 3 | 597485 | 4.0% |
| 6 | 520066 | 3.5% |
| 4 | 280992 | 1.9% |
| 0 | 166 | < 0.1% |
| 208 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 255 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 166 | < 0.1% |
| 1 | 10471701 | |
| 2 | 1986196 | 13.4% |
| 3 | 597485 | 4.0% |
| 4 | 280992 | 1.9% |
| 5 | 920006 | 6.2% |
| 6 | 520066 | 3.5% |
| 9 | 1 | < 0.1% |
| 208 | 1 | < 0.1% |
| 255 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 255 | 1 | < 0.1% |
| 208 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 6 | 520066 | 3.5% |
| 5 | 920006 | 6.2% |
| 4 | 280992 | 1.9% |
| 3 | 597485 | 4.0% |
| 2 | 1986196 | 13.4% |
| 1 | 10471701 | |
| 0 | 166 | < 0.1% |
trip_time_in_secs
Real number (ℝ)
High correlation
| Distinct | 6594 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 683.42359 |
| Minimum | 0 |
|---|---|
| Maximum | 10800 |
| Zeros | 34185 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 177 |
| Q1 | 360 |
| median | 554 |
| Q3 | 885 |
| 95-th percentile | 1614 |
| Maximum | 10800 |
| Range | 10800 |
| Interquartile range (IQR) | 525 |
Descriptive statistics
| Standard deviation | 494.40626 |
|---|---|
| Coefficient of variation (CV) | 0.7234258 |
| Kurtosis | 10.977518 |
| Mean | 683.42359 |
| Median Absolute Deviation (MAD) | 252 |
| Skewness | 2.2749304 |
| Sum | 1.0098687 × 1010 |
| Variance | 244437.55 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 360 | 552966 | 3.7% |
| 420 | 545596 | 3.7% |
| 300 | 533997 | 3.6% |
| 480 | 520228 | 3.5% |
| 540 | 483797 | 3.3% |
| 240 | 470535 | 3.2% |
| 600 | 440683 | 3.0% |
| 660 | 396544 | 2.7% |
| 720 | 354681 | 2.4% |
| 180 | 351204 | 2.4% |
| Other values (6584) | 10126384 |
| Value | Count | Frequency (%) |
| 0 | 34185 | |
| 1 | 1432 | < 0.1% |
| 2 | 2867 | < 0.1% |
| 3 | 2212 | < 0.1% |
| 4 | 2121 | < 0.1% |
| 5 | 1549 | < 0.1% |
| 6 | 1238 | < 0.1% |
| 7 | 1165 | < 0.1% |
| 8 | 1099 | < 0.1% |
| 9 | 1041 | < 0.1% |
| Value | Count | Frequency (%) |
| 10800 | 1 | < 0.1% |
| 10740 | 1 | < 0.1% |
| 10680 | 3 | |
| 10620 | 2 | |
| 10560 | 1 | < 0.1% |
| 10380 | 3 | |
| 10320 | 3 | |
| 10265 | 1 | < 0.1% |
| 10260 | 2 | |
| 10200 | 1 | < 0.1% |
trip_distance
Real number (ℝ)
High correlation
| Distinct | 4368 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.7709757 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 83376 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 1 |
| median | 1.7 |
| Q3 | 3.06 |
| 95-th percentile | 9.4 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 2.06 |
Descriptive statistics
| Standard deviation | 3.3059235 |
|---|---|
| Coefficient of variation (CV) | 1.193054 |
| Kurtosis | 29.057078 |
| Mean | 2.7709757 |
| Median Absolute Deviation (MAD) | 0.83 |
| Skewness | 3.8365388 |
| Sum | 40945641 |
| Variance | 10.92913 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.9 | 351766 | 2.4% |
| 1 | 351498 | 2.4% |
| 0.8 | 345082 | 2.3% |
| 1.1 | 337293 | 2.3% |
| 1.2 | 322671 | 2.2% |
| 0.7 | 321976 | 2.2% |
| 1.3 | 304896 | 2.1% |
| 1.4 | 288759 | 2.0% |
| 0.6 | 280786 | 1.9% |
| 1.5 | 271872 | 1.8% |
| Other values (4358) | 11600016 |
| Value | Count | Frequency (%) |
| 0 | 83376 | |
| 0.01 | 2968 | < 0.1% |
| 0.02 | 2479 | < 0.1% |
| 0.03 | 2508 | < 0.1% |
| 0.04 | 2979 | < 0.1% |
| 0.05 | 3643 | < 0.1% |
| 0.06 | 4085 | < 0.1% |
| 0.07 | 4447 | < 0.1% |
| 0.08 | 4814 | < 0.1% |
| 0.09 | 4739 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 5 | |
| 99.9 | 1 | < 0.1% |
| 99.8 | 1 | < 0.1% |
| 99.6 | 1 | < 0.1% |
| 99.3 | 1 | < 0.1% |
| 99.2 | 1 | < 0.1% |
| 99 | 1 | < 0.1% |
| 98.9 | 2 | < 0.1% |
| 98.8 | 1 | < 0.1% |
| 98.7 | 2 | < 0.1% |
pickup_longitude
Real number (ℝ)
High correlation Zeros
| Distinct | 40442 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -72.63634 |
| Minimum | -2771.2854 |
|---|---|
| Maximum | 112.40418 |
| Zeros | 267494 |
| Zeros (%) | 1.8% |
| Negative | 14509074 |
| Negative (%) | 98.2% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | -2771.2854 |
|---|---|
| 5-th percentile | -74.006592 |
| Q1 | -73.991882 |
| median | -73.981659 |
| Q3 | -73.966843 |
| 95-th percentile | -73.873047 |
| Maximum | 112.40418 |
| Range | 2883.6896 |
| Interquartile range (IQR) | 0.025039 |
Descriptive statistics
| Standard deviation | 10.138193 |
|---|---|
| Coefficient of variation (CV) | -0.13957466 |
| Kurtosis | 1821.8679 |
| Mean | -72.63634 |
| Median Absolute Deviation (MAD) | 0.011879 |
| Skewness | -2.3528999 |
| Sum | -1.0733192 × 109 |
| Variance | 102.78295 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 267494 | 1.8% |
| -73.982079 | 5506 | < 0.1% |
| -73.982239 | 5392 | < 0.1% |
| -73.982224 | 5371 | < 0.1% |
| -73.982208 | 5294 | < 0.1% |
| -73.982124 | 5118 | < 0.1% |
| -73.982262 | 5073 | < 0.1% |
| -73.982368 | 5001 | < 0.1% |
| -73.982094 | 4998 | < 0.1% |
| -73.9823 | 4994 | < 0.1% |
| Other values (40432) | 14462374 |
| Value | Count | Frequency (%) |
| -2771.2854 | 1 | |
| -2259.9832 | 1 | |
| -2249.2717 | 1 | |
| -2217.7666 | 1 | |
| -2211.8577 | 1 | |
| -2134.6482 | 1 | |
| -2113.6499 | 1 | |
| -2104.8601 | 1 | |
| -2014.3392 | 1 | |
| -2001.194 | 1 |
| Value | Count | Frequency (%) |
| 112.40418 | 1 | |
| 80.842125 | 1 | |
| 73.988716 | 1 | |
| 73.937798 | 1 | |
| 73.93779 | 1 | |
| 73.937775 | 1 | |
| 73.937759 | 1 | |
| 73.937752 | 1 | |
| 38.026054 | 1 | |
| 11.047888 | 1 |
pickup_latitude
Real number (ℝ)
High correlation Skewed Zeros
| Distinct | 64511 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.014399 |
| Minimum | -3547.9207 |
|---|---|
| Maximum | 3310.3645 |
| Zeros | 265104 |
| Zeros (%) | 1.8% |
| Negative | 130 |
| Negative (%) | < 0.1% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | -3547.9207 |
|---|---|
| 5-th percentile | 40.70377 |
| Q1 | 40.735512 |
| median | 40.753147 |
| Q3 | 40.767288 |
| 95-th percentile | 40.787636 |
| Maximum | 3310.3645 |
| Range | 6858.2852 |
| Interquartile range (IQR) | 0.031776 |
Descriptive statistics
| Standard deviation | 7.7899041 |
|---|---|
| Coefficient of variation (CV) | 0.19467752 |
| Kurtosis | 75801.186 |
| Mean | 40.014399 |
| Median Absolute Deviation (MAD) | 0.01564 |
| Skewness | -127.80996 |
| Sum | 5.9127737 × 108 |
| Variance | 60.682605 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 265104 | 1.8% |
| 40.758148 | 2586 | < 0.1% |
| 40.758011 | 2584 | < 0.1% |
| 40.774109 | 2516 | < 0.1% |
| 40.774094 | 2477 | < 0.1% |
| 40.774101 | 2464 | < 0.1% |
| 40.774117 | 2464 | < 0.1% |
| 40.759426 | 2461 | < 0.1% |
| 40.774132 | 2419 | < 0.1% |
| 40.774078 | 2400 | < 0.1% |
| Other values (64501) | 14489140 |
| Value | Count | Frequency (%) |
| -3547.9207 | 1 | < 0.1% |
| -3447.9197 | 1 | < 0.1% |
| -3447.9177 | 1 | < 0.1% |
| -3447.9167 | 1 | < 0.1% |
| -3181.0781 | 1 | < 0.1% |
| -3127.637 | 1 | < 0.1% |
| -3115.2737 | 1 | < 0.1% |
| -3114.3157 | 1 | < 0.1% |
| -3114.2949 | 1 | < 0.1% |
| -3114.2922 | 3 |
| Value | Count | Frequency (%) |
| 3310.3645 | 1 | |
| 3210.3625 | 1 | |
| 3210.3447 | 1 | |
| 3210.344 | 1 | |
| 3124.198 | 1 | |
| 2317.6506 | 1 | |
| 2313.0457 | 1 | |
| 2210.175 | 1 | |
| 2210.1724 | 1 | |
| 2152.334 | 1 |
dropoff_longitude
Real number (ℝ)
Zeros
| Distinct | 56249 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 86 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -72.594427 |
| Minimum | -2350.9556 |
|---|---|
| Maximum | 2228.7375 |
| Zeros | 275657 |
| Zeros (%) | 1.9% |
| Negative | 14500825 |
| Negative (%) | 98.1% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | -2350.9556 |
|---|---|
| 5-th percentile | -74.006927 |
| Q1 | -73.991211 |
| median | -73.980125 |
| Q3 | -73.963898 |
| 95-th percentile | -73.900284 |
| Maximum | 2228.7375 |
| Range | 4579.6931 |
| Interquartile range (IQR) | 0.027313 |
Descriptive statistics
| Standard deviation | 10.288603 |
|---|---|
| Coefficient of variation (CV) | -0.14172718 |
| Kurtosis | 1761.6191 |
| Mean | -72.594427 |
| Median Absolute Deviation (MAD) | 0.012726 |
| Skewness | 0.68020638 |
| Sum | -1.0726937 × 109 |
| Variance | 105.85536 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 275657 | 1.9% |
| -73.982239 | 4366 | < 0.1% |
| -73.982208 | 4264 | < 0.1% |
| -73.982079 | 4257 | < 0.1% |
| -73.982224 | 4222 | < 0.1% |
| -73.982368 | 4160 | < 0.1% |
| -73.982262 | 4055 | < 0.1% |
| -73.981956 | 4044 | < 0.1% |
| -73.982285 | 3993 | < 0.1% |
| -73.98233 | 3980 | < 0.1% |
| Other values (56239) | 14463531 |
| Value | Count | Frequency (%) |
| -2350.9556 | 1 | |
| -2343.4888 | 1 | |
| -2331.8333 | 1 | |
| -2236.5166 | 1 | |
| -2157 | 1 | |
| -2148.71 | 1 | |
| -2032.405 | 1 | |
| -2006.7937 | 1 | |
| -1991.0743 | 1 | |
| -1952.6053 | 1 |
| Value | Count | Frequency (%) |
| 2228.7375 | 1 | < 0.1% |
| 2084.3 | 1 | < 0.1% |
| 1347.4446 | 1 | < 0.1% |
| 111.49388 | 1 | < 0.1% |
| 84.315735 | 1 | < 0.1% |
| 84.30883 | 1 | < 0.1% |
| 80.842125 | 1 | < 0.1% |
| 73.937798 | 1 | < 0.1% |
| 73.937759 | 3 | |
| 73.937752 | 1 | < 0.1% |
dropoff_latitude
Real number (ℝ)
High correlation Skewed Zeros
| Distinct | 88766 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 86 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.992189 |
| Minimum | -3547.9207 |
|---|---|
| Maximum | 3477.1055 |
| Zeros | 273357 |
| Zeros (%) | 1.8% |
| Negative | 108 |
| Negative (%) | < 0.1% |
| Memory size | 112.7 MiB |
Quantile statistics
| Minimum | -3547.9207 |
|---|---|
| 5-th percentile | 40.689629 |
| Q1 | 40.734684 |
| median | 40.75362 |
| Q3 | 40.768192 |
| 95-th percentile | 40.79287 |
| Maximum | 3477.1055 |
| Range | 7025.0262 |
| Interquartile range (IQR) | 0.033508 |
Descriptive statistics
| Standard deviation | 7.5370668 |
|---|---|
| Coefficient of variation (CV) | 0.18846347 |
| Kurtosis | 78387.389 |
| Mean | 39.992189 |
| Median Absolute Deviation (MAD) | 0.016384 |
| Skewness | -141.44463 |
| Sum | 5.9094575 × 108 |
| Variance | 56.807376 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 273357 | 1.8% |
| 40.758148 | 2630 | < 0.1% |
| 40.759426 | 2522 | < 0.1% |
| 40.758011 | 2474 | < 0.1% |
| 40.744915 | 1855 | < 0.1% |
| 40.758453 | 1767 | < 0.1% |
| 40.750149 | 1704 | < 0.1% |
| 40.750172 | 1671 | < 0.1% |
| 40.750118 | 1616 | < 0.1% |
| 40.750156 | 1606 | < 0.1% |
| Other values (88756) | 14485327 |
| Value | Count | Frequency (%) |
| -3547.9207 | 1 | |
| -3547.8953 | 1 | |
| -3481.1426 | 1 | |
| -3481.1343 | 1 | |
| -3347.9331 | 1 | |
| -3255.5735 | 1 | |
| -3117.5684 | 1 | |
| -3117.5374 | 1 | |
| -3117.5229 | 1 | |
| -3117.4885 | 1 |
| Value | Count | Frequency (%) |
| 3477.1055 | 1 | |
| 3210.3679 | 1 | |
| 3210.3381 | 1 | |
| 3177.1118 | 1 | |
| 1727.0167 | 1 | |
| 1705.8805 | 1 | |
| 1651.5535 | 1 | |
| 1442.6033 | 1 | |
| 1421.3934 | 1 | |
| 1330.6375 | 1 |
Interactions
Correlations
| dropoff_latitude | dropoff_longitude | passenger_count | pickup_latitude | pickup_longitude | rate_code | store_and_fwd_flag | trip_distance | trip_time_in_secs | vendor_id | |
|---|---|---|---|---|---|---|---|---|---|---|
| dropoff_latitude | 1.000 | 0.476 | -0.005 | 0.506 | 0.229 | -0.088 | 1.000 | -0.063 | -0.104 | 0.002 |
| dropoff_longitude | 0.476 | 1.000 | -0.007 | 0.212 | 0.408 | 0.054 | 0.002 | 0.122 | 0.051 | 0.002 |
| passenger_count | -0.005 | -0.007 | 1.000 | -0.009 | -0.011 | 0.004 | 1.000 | 0.029 | 0.024 | 0.000 |
| pickup_latitude | 0.506 | 0.212 | -0.009 | 1.000 | 0.521 | -0.119 | 1.000 | -0.072 | -0.078 | 0.002 |
| pickup_longitude | 0.229 | 0.408 | -0.011 | 0.521 | 1.000 | 0.113 | 0.000 | 0.043 | 0.017 | 0.002 |
| rate_code | -0.088 | 0.054 | 0.004 | -0.119 | 0.113 | 1.000 | 0.000 | 0.152 | 0.155 | 0.001 |
| store_and_fwd_flag | 1.000 | 0.002 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 | 0.021 | 0.024 | 1.000 |
| trip_distance | -0.063 | 0.122 | 0.029 | -0.072 | 0.043 | 0.152 | 0.021 | 1.000 | 0.844 | 0.017 |
| trip_time_in_secs | -0.104 | 0.051 | 0.024 | -0.078 | 0.017 | 0.155 | 0.024 | 0.844 | 1.000 | 0.009 |
| vendor_id | 0.002 | 0.002 | 0.000 | 0.002 | 0.002 | 0.001 | 1.000 | 0.017 | 0.009 | 1.000 |
Missing values
Sample
| medallion | hack_license | vendor_id | rate_code | store_and_fwd_flag | pickup_datetime | dropoff_datetime | passenger_count | trip_time_in_secs | trip_distance | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 89D227B655E5C82AECF13C3F540D4CF4 | BA96DE419E711691B9445D6A6307C170 | CMT | 1 | N | 2013-01-01 15:11:48 | 2013-01-01 15:18:10 | 4 | 382 | 1.0 | -73.978165 | 40.757977 | -73.989838 | 40.751171 |
| 1 | 0BD7C8F5BA12B88E0B67BED28BEA73D8 | 9FD8F69F0804BDB5549F40E9DA1BE472 | CMT | 1 | N | 2013-01-06 00:18:35 | 2013-01-06 00:22:54 | 1 | 259 | 1.5 | -74.006683 | 40.731781 | -73.994499 | 40.750660 |
| 2 | 0BD7C8F5BA12B88E0B67BED28BEA73D8 | 9FD8F69F0804BDB5549F40E9DA1BE472 | CMT | 1 | N | 2013-01-05 18:49:41 | 2013-01-05 18:54:23 | 1 | 282 | 1.1 | -74.004707 | 40.737770 | -74.009834 | 40.726002 |
| 3 | DFD2202EE08F7A8DC9A57B02ACB81FE2 | 51EE87E3205C985EF8431D850C786310 | CMT | 1 | N | 2013-01-07 23:54:15 | 2013-01-07 23:58:20 | 2 | 244 | 0.7 | -73.974602 | 40.759945 | -73.984734 | 40.759388 |
| 4 | DFD2202EE08F7A8DC9A57B02ACB81FE2 | 51EE87E3205C985EF8431D850C786310 | CMT | 1 | N | 2013-01-07 23:25:03 | 2013-01-07 23:34:24 | 1 | 560 | 2.1 | -73.976250 | 40.748528 | -74.002586 | 40.747868 |
| 5 | 20D9ECB2CA0767CF7A01564DF2844A3E | 598CCE5B9C1918568DEE71F43CF26CD2 | CMT | 1 | N | 2013-01-07 15:27:48 | 2013-01-07 15:38:37 | 1 | 648 | 1.7 | -73.966743 | 40.764252 | -73.983322 | 40.743763 |
| 6 | 496644932DF3932605C22C7926FF0FE0 | 513189AD756FF14FE670D10B92FAF04C | CMT | 1 | N | 2013-01-08 11:01:15 | 2013-01-08 11:08:14 | 1 | 418 | 0.8 | -73.995804 | 40.743977 | -74.007416 | 40.744343 |
| 7 | 0B57B9633A2FECD3D3B1944AFC7471CF | CCD4367B417ED6634D986F573A552A62 | CMT | 1 | N | 2013-01-07 12:39:18 | 2013-01-07 13:10:56 | 3 | 1898 | 10.7 | -73.989937 | 40.756775 | -73.865250 | 40.770630 |
| 8 | 2C0E91FF20A856C891483ED63589F982 | 1DA2F6543A62B8ED934771661A9D2FA0 | CMT | 1 | N | 2013-01-07 18:15:47 | 2013-01-07 18:20:47 | 1 | 299 | 0.8 | -73.980072 | 40.743137 | -73.982712 | 40.735336 |
| 9 | 2D4B95E2FA7B2E85118EC5CA4570FA58 | CD2F522EEE1FF5F5A8D8B679E23576B3 | CMT | 1 | N | 2013-01-07 15:33:28 | 2013-01-07 15:49:26 | 2 | 957 | 2.5 | -73.977936 | 40.786983 | -73.952919 | 40.806370 |
| medallion | hack_license | vendor_id | rate_code | store_and_fwd_flag | pickup_datetime | dropoff_datetime | passenger_count | trip_time_in_secs | trip_distance | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 14776605 | A8262FA0AFCB6C7229F6888EAFBDE076 | 1BDF89260FEF1AE6FDDE839A0278D31D | CMT | 2 | N | 2013-01-07 07:29:06 | 2013-01-07 08:19:39 | 1 | 3032 | 21.3 | -73.776794 | 40.645775 | -74.010933 | 40.704960 |
| 14776606 | A8262FA0AFCB6C7229F6888EAFBDE076 | 1BDF89260FEF1AE6FDDE839A0278D31D | CMT | 1 | N | 2013-01-07 14:30:23 | 2013-01-07 14:42:14 | 1 | 711 | 1.4 | -73.946114 | 40.801075 | -73.966530 | 40.805023 |
| 14776607 | F33EF464441839C6F0DABAABBC93B45D | 313F66DD09C308EADA3B307F6B8CF7A9 | CMT | 1 | N | 2013-01-10 10:56:47 | 2013-01-10 11:05:52 | 1 | 545 | 1.4 | -73.975410 | 40.759106 | -73.961830 | 40.776527 |
| 14776608 | 56CE01E7DBE0E6449FA1758F082D8884 | 4C6FE2FCFED26629D515D291EC1516A0 | CMT | 1 | N | 2013-01-10 14:50:01 | 2013-01-10 15:19:10 | 1 | 1748 | 4.0 | -73.957344 | 40.785732 | -73.994942 | 40.742931 |
| 14776609 | 32201027CDC62D654DC3AD9747A07C96 | B8DDB9F8143017E22104050B26C2A65D | CMT | 1 | N | 2013-01-05 08:58:18 | 2013-01-05 09:05:56 | 1 | 458 | 3.2 | -73.998901 | 40.734509 | -73.966820 | 40.770138 |
| 14776610 | B33E71CD9E8FE1BE3B70FEB6E807DD15 | BAF57796E45D921BB23217E17A372FF6 | CMT | 1 | N | 2013-01-06 04:58:23 | 2013-01-06 05:11:24 | 1 | 781 | 3.3 | -73.989029 | 40.759327 | -73.953743 | 40.770672 |
| 14776611 | ED160B76D5349C8AC1ECF22CD4B8D538 | 3B93F6DA5DEBDE9560993FA624C4FF76 | CMT | 1 | N | 2013-01-08 14:42:04 | 2013-01-08 14:50:27 | 1 | 503 | 1.0 | -73.993042 | 40.733990 | -73.982483 | 40.724823 |
| 14776612 | D83F9AC0E33F6F19869C243BE6AB6FE5 | 85A55B6772275374EF90AC9457DC1F83 | CMT | 1 | N | 2013-01-10 13:29:23 | 2013-01-10 13:34:45 | 1 | 321 | 0.9 | -73.979553 | 40.785011 | -73.968262 | 40.788158 |
| 14776613 | 04E59442A7DDBCE515E33CD355D866E7 | 7913172189931A1A1632562B10AB53C4 | CMT | 1 | N | 2013-01-06 16:30:15 | 2013-01-06 16:42:26 | 1 | 730 | 1.3 | -73.968002 | 40.762161 | -73.985992 | 40.770542 |
| 14776614 | D30BED60331C79E3F7ACD05B325ED42F | B5E1D2461A5BCC8819188DACEC17CD69 | CMT | 1 | N | 2013-01-05 20:38:46 | 2013-01-05 20:43:06 | 1 | 260 | 0.8 | -73.982224 | 40.766670 | -73.989212 | 40.773636 |